其他
R自学笔记|双因素方差分析+箱线图字母标注
近日,在学习中遇到了一个问题,问题如下:
我想要绘制这样的图形,按分组在组内标注出差异显著性,要求字母标注,如下图。
碰到的问题是,箱线图展示出来的数据并非是平均值的值,而显著性字母标记是在分组统计到最终的平均值后得到的,与平均值一一对应,所以这里画完箱线图,想要直接将字母标注映射到图中是不可能,会报错,报错提示如下:
Error in `geom_text()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 2nd layer.
Caused by error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (18)
✖ Fix the following mappings: `label`
Run `rlang::last_trace()` to see where the error occurred.
虽然看不大懂,但是大概知道是映射出的问题,我的理解就是字母标注和数据未能做到一一对应。
摸索了好一阵子,通过网络搜索学习,最终解决方案如下:
首先,我构建一个示例数据,纯属乱编,仅为这个案例使用。
# 构建数据
group = c(rep("a", 9), rep("b", 9))
Treatment = rep(c("T1","T2","T3"), each = 3, 2)
Block = rep(c("B1","B2","B3"), 6)
V = c(1.5,1.8,1.1,2.6,3.2,3.8,3.7,4.1,4.9,2.7,3.4,2.1,4.1,4.9,5.2,5.8,6.2,6.5)
sample.boxplot <- data.frame(group, Treatment, Block, V)
sample.boxplot
## group Treatment Block V
## 1 a T1 B1 1.5
## 2 a T1 B2 1.8
## 3 a T1 B3 1.1
## 4 a T2 B1 2.6
## 5 a T2 B2 3.2
## 6 a T2 B3 3.8
## 7 a T3 B1 3.7
## 8 a T3 B2 4.1
## 9 a T3 B3 4.9
## 10 b T1 B1 2.7
## 11 b T1 B2 3.4
## 12 b T1 B3 2.1
## 13 b T2 B1 4.1
## 14 b T2 B2 4.9
## 15 b T2 B3 5.2
## 16 b T3 B1 5.8
## 17 b T3 B2 6.2
## 18 b T3 B3 6.5
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
先进行方差分析。
sample.boxplot.aov <- aov(V ~ Block + group * Treatment, data = sample.boxplot)
anova(sample.boxplot.aov)
## Analysis of Variance Table
##
## Response: V
## Df Sum Sq Mean Sq F value Pr(>F)
## Block 2 1.1378 0.5689 2.4568 0.1355
## group 1 11.2022 11.2022 48.3781 3.921e-05 ***
## Treatment 2 29.2311 14.6156 63.1190 2.131e-06 ***
## group:Treatment 2 0.3378 0.1689 0.7294 0.5062
## Residuals 10 2.3156 0.2316
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
从结果可以看出主效应group和Treatment分别对V影响显著,无显著交互作用。
进一步进行主效应的多重比较。
library(agricolae)
# 主效应group
sampleout1 <- LSD.test(sample.boxplot.aov,"group",p.adj="none")
sampleout1
## $statistics
## MSerror Df Mean CV t.value LSD
## 0.2315556 10 3.755556 12.81308 2.228139 0.505433
##
## $parameters
## test p.ajusted name.t ntr alpha
## Fisher-LSD none group 2 0.05
##
## $means
## V std r se LCL UCL Min Max Q25 Q50 Q75
## a 2.966667 1.296148 9 0.1604007 2.609272 3.324062 1.1 4.9 1.8 3.2 3.8
## b 4.544444 1.564538 9 0.1604007 4.187049 4.901840 2.1 6.5 3.4 4.9 5.8
##
## $comparison
## NULL
##
## $groups
## V groups
## b 4.544444 a
## a 2.966667 b
##
## attr(,"class")
## [1] "group"
# 主效应Treatment
sampleout2 <- LSD.test(sample.boxplot.aov,"Treatment",p.adj="none")
sampleout2
## $statistics
## MSerror Df Mean CV t.value LSD
## 0.2315556 10 3.755556 12.81308 2.228139 0.6190265
##
## $parameters
## test p.ajusted name.t ntr alpha
## Fisher-LSD none Treatment 3 0.05
##
## $means
## V std r se LCL UCL Min Max Q25 Q50 Q75
## T1 2.100000 0.8366600 6 0.19645 1.662282 2.537718 1.1 3.4 1.575 1.95 2.55
## T2 3.966667 0.9892758 6 0.19645 3.528949 4.404384 2.6 5.2 3.350 3.95 4.70
## T3 5.200000 1.1489125 6 0.19645 4.762282 5.637718 3.7 6.5 4.300 5.35 6.10
##
## $comparison
## NULL
##
## $groups
## V groups
## T3 5.200000 a
## T2 3.966667 b
## T1 2.100000 c
##
## attr(,"class")
## [1] "group"
因为最终成图的字母标注是在各组内标记的,所以这里我对各组分别进行单因素方差分析,并多重比较。
# a组组内处理差异
sample.boxplota <- sample.boxplot %>% filter(group == "a")
sample.boxplota
## group Treatment Block V
## 1 a T1 B1 1.5
## 2 a T1 B2 1.8
## 3 a T1 B3 1.1
## 4 a T2 B1 2.6
## 5 a T2 B2 3.2
## 6 a T2 B3 3.8
## 7 a T3 B1 3.7
## 8 a T3 B2 4.1
## 9 a T3 B3 4.9
sample.boxplota.aov <- aov(V ~ Block + Treatment, data = sample.boxplota)
anova(sample.boxplota.aov)
## Analysis of Variance Table
##
## Response: V
## Df Sum Sq Mean Sq F value Pr(>F)
## Block 2 0.6867 0.3433 1.3377 0.359067
## Treatment 2 11.7267 5.8633 22.8442 0.006481 **
## Residuals 4 1.0267 0.2567
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
sampleouta <- LSD.test(sample.boxplota.aov, "Treatment", p.adj="none")
sampleouta
## $statistics
## MSerror Df Mean CV t.value LSD
## 0.2566667 4 2.966667 17.07717 2.776445 1.148493
##
## $parameters
## test p.ajusted name.t ntr alpha
## Fisher-LSD none Treatment 3 0.05
##
## $means
## V std r se LCL UCL Min Max Q25 Q50 Q75
## T1 1.466667 0.3511885 3 0.2924988 0.6545598 2.278774 1.1 1.8 1.3 1.5 1.65
## T2 3.200000 0.6000000 3 0.2924988 2.3878931 4.012107 2.6 3.8 2.9 3.2 3.50
## T3 4.233333 0.6110101 3 0.2924988 3.4212264 5.045440 3.7 4.9 3.9 4.1 4.50
##
## $comparison
## NULL
##
## $groups
## V groups
## T3 4.233333 a
## T2 3.200000 a
## T1 1.466667 b
##
## attr(,"class")
## [1] "group"
# b组组内处理差异
sample.boxplotb <- sample.boxplot %>% filter(group == "b")
sample.boxplotb
## group Treatment Block V
## 1 b T1 B1 2.7
## 2 b T1 B2 3.4
## 3 b T1 B3 2.1
## 4 b T2 B1 4.1
## 5 b T2 B2 4.9
## 6 b T2 B3 5.2
## 7 b T3 B1 5.8
## 8 b T3 B2 6.2
## 9 b T3 B3 6.5
sample.boxplotb.aov <- aov(V ~ Block + Treatment, data = sample.boxplotb)
anova(sample.boxplotb.aov)
## Analysis of Variance Table
##
## Response: V
## Df Sum Sq Mean Sq F value Pr(>F)
## Block 2 0.6156 0.3078 1.0949 0.417616
## Treatment 2 17.8422 8.9211 31.7352 0.003515 **
## Residuals 4 1.1244 0.2811
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
sampleoutb <- LSD.test(sample.boxplotb.aov, "Treatment", p.adj="none")
sampleoutb
## $statistics
## MSerror Df Mean CV t.value LSD
## 0.2811111 4 4.544444 11.66697 2.776445 1.201939
##
## $parameters
## test p.ajusted name.t ntr alpha
## Fisher-LSD none Treatment 3 0.05
##
## $means
## V std r se LCL UCL Min Max Q25 Q50 Q75
## T1 2.733333 0.6506407 3 0.3061106 1.883434 3.583233 2.1 3.4 2.4 2.7 3.05
## T2 4.733333 0.5686241 3 0.3061106 3.883434 5.583233 4.1 5.2 4.5 4.9 5.05
## T3 6.166667 0.3511885 3 0.3061106 5.316767 7.016566 5.8 6.5 6.0 6.2 6.35
##
## $comparison
## NULL
##
## $groups
## V groups
## T3 6.166667 a
## T2 4.733333 b
## T1 2.733333 c
##
## attr(,"class")
## [1] "group"
进一步的对原始数据进行分组统计构建新的数据集sample.boxplot1。
# 分组统计数据,新建数据集sample.boxplot1
sample.boxplot1 <- sample.boxplot %>% group_by(group, Treatment) %>%
summarise(vmean = mean(V), vsd = sd(V))
## `summarise()` has grouped output by 'group'. You can override using the
## `.groups` argument.
sample.boxplot1
## # A tibble: 6 × 4
## # Groups: group [2]
## group Treatment vmean vsd
## <chr> <chr> <dbl> <dbl>
## 1 a T1 1.47 0.351
## 2 a T2 3.2 0.6
## 3 a T3 4.23 0.611
## 4 b T1 2.73 0.651
## 5 b T2 4.73 0.569
## 6 b T3 6.17 0.351
# 查看之前组内单因素方差分析结果的字母标注
sampleouta$groups
## V groups
## T3 4.233333 a
## T2 3.200000 a
## T1 1.466667 b
sampleoutb$groups
## V groups
## T3 6.166667 a
## T2 4.733333 b
## T1 2.733333 c
# 将字母标注添加到sample.boxplot1
sample.boxplot1 <- data.frame(sample.boxplot1,
vsig = c("b", "a", "a", "c", "b", "a"))
sample.boxplot1
## group Treatment vmean vsd vsig
## 1 a T1 1.466667 0.3511885 b
## 2 a T2 3.200000 0.6000000 a
## 3 a T3 4.233333 0.6110101 a
## 4 b T1 2.733333 0.6506407 c
## 5 b T2 4.733333 0.5686241 b
## 6 b T3 6.166667 0.3511885 a
绘图
# 绘图并添加字母标注
sample.boxplot %>%
ggplot(aes(x = Treatment, y = V, fill = group)) +
geom_boxplot() +
geom_point(data = sample.boxplot1,
aes(x = Treatment, y = vmean),
position = position_dodge(0.8), color = "white") +
geom_text(data = sample.boxplot1, aes(x = Treatment, y = vmean + 0.8,
label = vsig),position = position_dodge(0.8))
调整主题美化图形。
# 主题调整美化
sample.boxplot %>%
ggplot(aes(x = Treatment, y = V, fill = group)) +
geom_boxplot() +
geom_point(data = sample.boxplot1,
aes(x = Treatment, y = vmean),
position = position_dodge(0.8), color = "white") +
geom_text(data = sample.boxplot1, aes(x = Treatment, y = vmean + 0.8, label = vsig),
position = position_dodge(0.8)) +
theme_bw() +
theme(text = element_text(size = 14, family = "serif"),
axis.text = element_text(family = "serif", colour = "black")) +
theme(legend.position = "top", legend.title = element_blank(),
text = element_text(size = 14, family = "serif"),
axis.text = element_text(size = 14, family = "serif", colour = "black"))